Creating a Decision Tree

Creating a Decision Tree

  1. In  the Continuous Troubleshooter, from Step 3: Modeling, the Launch Decision Tree icon in the toolbar becomes active.

  2. Select Fields For Model:

    • Select the inputs and target fields to be used from the list of available fields. The input fields can typically be Continuous (Integer or Double) or Discrete (String, Integer or Double) data types. The target field must have Discrete data type.

    • Click [Ok] when done.

  3. The decision tree created will show:

    • the root node, which represents all the data of the process

    • the different classification types, and how many data points there are in each class

    • the percentage distribution of the data that falls in the specific classification, also displayed graphically.

    • The target class is shown below the classifications.

  4. This decision tree can now be created automatically, or nodes can be added manually and individually customized.

    Create tree automatically
    : Select the tree and a blue border will frame the root node, and click on the [Create Tree] icon to induce the child nodes from the root of the decision tree automatically. Alternatively, right click on the root node and select [Create Tree].

    When creating a decision tree, there are three popular methodologies applied during the automatic creation of these classification trees. This Impurity Measure method needs to be selected in order to induce the tree:

    • Entropy Gain: the split provides the maximum information in one class. Entropy gain is also known as Information Gain, and is a measure of the amount of information contained in a node split, or a measure of the uncertainty associated with a random variable.

    • Chi Square: splits the population into subpopulations with significantly different distributions. A Chi-square test is performed which subtracts the expected number from the actual number, squares the difference, and divides the difference by the expected number. Values for each class are added together to arrive at a score. This score is a measure of the purity of the split in a decision tree. A high score means that the proposed split successfully splits the population into subpopulations with significantly different distributions.

    • Gini Index: splits off a single group of as large a size as possible. Gini impurity is based on squared probabilities of membership for each target category in the node. It reaches its maximum value when class sizes at the node are equal, and its minimum (zero) when all cases in the node fall into a single target category, and thus there is only one class present at the node.

      The decision tree created will automatically classify the attribute variables into each of the class variables of the target field selected. The independent variables are set as predictor attributes at decision nodes. The algorithm used within the classification tree identifies the best way to explain the dependent variable as a function of the independent variables. These algorithms continue recursively down the tree until stopping criteria are met.

  5. Create child node: A single node can be created by clicking on the [Create Child Node] icon, or select the parent node, right click, and select [Create Child Node]. You will be given an option to induce the node automatically or to customize the criteria of the split.

    Child Node Creation window: each of the input fields can be selected as variable attributes.  They are automatically ranked according to which field will provide the most information for the split. Select the input field to use as a classification variable, and select [automatic] or [custom].

  • Automatic: Select the input variable ranked the highest which will give the most information for doing the best split.  The value at which the split will occur will be determined automatically, enabling the most accurate classification for the particular data set.

  • Custom: The user can select the input variable with the highest ranking, and enter the values where the classification split is performed. The ranking for the split by the parameter selected can be calculated by clicking on [Calculate ranking]. Click [Ok] when done.

  1. Export Rules: Select the [Export Rules] icon to export the rules generated by the split in text format.

Zoom features

  • Zoom in:

    Zoom in on the tree

  • Zoom out:

    Zoom out on the tree

  • Zoom to fit:

    Zoom to fit the entire tree on the view

  • Zoom default:

    Zoom to the default view size with the root node centered at the top.

Node Options

The following options are available from the right click menu on any decision tree node:

  • Create tree:

    Automatically creates the child nodes below the selected node.

  • Create child nodes:

    Creates child nodes for one level below the selected node. Creation can be automatic or customized.

  • Delete child nodes:

    Deletes all the child nodes below the selected node.

  • View node data:

    This opens the database with the raw data.


Related Topics:

  

CSense 2023- Last updated: June 24,2025